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Bell inequality violations can be used to certify private randomness for use in cryptographic 
applications. In photonic Bell experiments, a large amount of the data that is generated comes 
from no-detection events and presumably contains little randomness. This raises the question as to 
whether randomness can be extracted only from the smaller post-selected subset corresponding to 
proper detection events, instead of from the entire set of data. This could in principle be feasible 
without opening an analogue of the detection loophole as long as the min-entropy of the post- 
selected data is evaluated by taking all the information into account, including no-detection events. 

The possibility of extracting randomness from a short string has a practical advantage, because it 
reduces the computational time of the extraction. 

Here, we investigate the above idea in a simple scenario, where the devices and the adversary 
behave according to i.i.d. strategies. We show that indeed almost all the randomness is present 
in the pair of outcomes for which at least one detection happened. We further show that in some 
cases applying a pre-processing on the data can capture features that an analysis based on global 
frequencies only misses, thus resulting in the certification of more randomness. We then briefly 
consider non-i.i.d strategies and provide an explicit example of such a strategy that is more powerful 
than any i.i.d. one even in the asymptotic limit of infinitely many measurement rounds, something 
that was not reported before in the context of Bell inequalities. 


I. INTRODUCTION 

Sources of randomness have numerous applications: in 
algorithms, samplings, numerical simulations, gambling, 
and of course cryptography m ■ The last application 
demands sources that can be certified as being uncor¬ 
related to any outside process or variable, i.e. private 
randomness. Typically, the output of a physical pro¬ 
cess (thermal noise, shot noise, ...) is considered ran¬ 
dom in this sense only if certain assumptions are made 
on its underlying behavior. The violation of Bell inequal¬ 
ities, however, certifies private randomness in a device¬ 
independent way H 0. From the amount of violation, 
one obtains a lower bound on the min-entropy H of the 
output string generated by the process [SHZ]. This infor¬ 
mation is then sufficient to extract randomness: indeed, 
one can design seeded extractors, whose output is a string 
of (roughly) H bits guaranteed to be uniformly random, 
even according to an external adversary. 

A Bell experiment, however, produces much more in¬ 
formation than the mere violation of a single inequality. 
For instance, one can estimate the single-run frequencies 
p(a , b |x, y ) of the outcomes (a, b ) conditioned on the set¬ 
tings ( x,y ). When this knowledge is taken into account, 
higher values for the lower bounds on H can in prin¬ 
ciple be obtained EH- More generally, there may be 
other ways to process the data that can lead to improved 
bounds on the randomness, as the following example il¬ 
lustrates. 

Consider a Bell experiment running for two days, each 
day consisting of N S> 1 runs. Suppose that, on the first 
day, the setup produces outcomes that violate the CHSH 
inequality maximally; on the second day, for some techni¬ 


cal glitch, the detectors don’t fire, so the list of outcomes 
consists only of double no-detection events. Suppose that 
the users estimate the amount of randomness generated 
using solely the observed CHSH violation /, using the 

simple bound H > 1 — log 2 ^1 + a/2 — I 2 /4^ [5J. Sup¬ 
pose further that they planned to extract randomness 
every two day. Over the two day period, they observe 
an average CHSH violation of (2-/2 + 2)/2 ~ 2.41 (we 
take the convention that no-detection events are mapped 
to +1 outcomes), from which they deduce a randomness 
rate of ~ 0.2 bit/run for Alice’s outcomes, that is ~ 0.4iV 
bits in total for the two-day period. However, the users 
might have chosen to extract randomness at the end of 
each day instead. The same techniques certify now 1 
bit/run for Alice on the first day and 0 on the second, 
for a total of N bits over the two days [10] What hap¬ 
pened is clear: the data contain the information that 
two processes are involved; this information was missed 
by the overall analysis, but was revealed by the choice of 
sorting the data in two blocks. 

The example is extreme, but a simple variation is very 
relevant: the case in which no-detection events are evenly 
spread during the whole duration of the experiment is 
a good approximation to the data produced in photon¬ 
ics Bell tests, in which no-detection events constitute a 
large fraction of the runs (see e.g. Table I in mi)- No¬ 
detection events come from two processes: the finite ef¬ 
ficiency of the detectors, and the fact that parametric 
down-conversion often produces the vacuum state. The 
physics of both suggests that these events contain little 
or no randomness: it is thus tempting to sort the out¬ 
comes of the Bell test in two groups, the detections and 
the no-detections. As in the previous example, this may 
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lead to certify more randomness. Even if it does not, one 
may get a practical advantage by extracting randomness 
only from the detection events. Indeed, randomness ex¬ 
tractors require an independent random seed: the longer 
the initial string, the longer the needed seed and the com¬ 
putational time to output the result; in fact, it is an ac¬ 
tive research direction to construct randomness extractor 
with short seed length [3] . Thus, it is beneficial to be able 
to extract randomness from a short string. 

Here, we investigate the amount of randomness that 
can be certified in Bell tests within the subset of detection 
events. For this first study, our aim is simply to deter¬ 
mine whether this is actually a viable strategy. We thus 
perform our analysis in the simplified scenario in which 
the devices and the adversary behave in an i.i.d. way 
and in the limit of infinitely many measurement rounds. 
If randomness cannot be certified in this simple scenario, 
then it can also certainly not be certified in the non-i.i.d. 
finite statistics case. 

The post-selection of detection events notoriously 
opens the detection loophole pa eng. It is important 
to clarify that our approach does not fall into that trap. 
We shall compute a lower bound on the randomness that 
can be extracted from a subset of events, but the bound 
is obtained by taking into account the whole set of events. 
In particular, if the behavior of the devices is compati¬ 
ble with local realism due to the detection loophole, our 
method will say that no randomness can be certified in 
the post-selected set of detection events. 

Let us remark that a similar analysis in the context 
of violating local realism, namely the p-value of post- 
selected events which does not contain no-detections, has 
been done recently El- 

After introducing the technique that we will use to 
bound randomness in Section [TTJ we apply it to sev¬ 
eral physically-motivated examples in section [TTT] In Sec¬ 
tion |IV| we analyse more precisely the effect of post¬ 
selection in a simplified case. A glimpse beyond the i.i.d. 
restriction is given in Section |V| before the conclusion. 


II. AVERAGE RANDOMNESS IN 
POST-SELECTED EVENTS 

Consider a Bell experiment consisting of two sepa¬ 
rate devices in which each party inputs x £ X and 
y £ y and obtains outputs a £ A and b £ B, re¬ 
spectively. The behavior of such devices over n succes¬ 
sive runs can be characterized by the - generally un¬ 
known - joint probabilities p(ab|xy) to obtain the out¬ 
put string ab = (ai&i, ... ,a n b n ) given the input string 
xy = (aqyi, ... ,x n y n ). The information that an adver¬ 
sary has over the output string can be characterized by a 
tripartite quantum distribution p(abe|xyz) where e de¬ 
notes the output the adversary obtains when he makes 
a measurement z on a system possibly entangled with 
Alice and Bob’s devices. In general e can be a string 
of arbitrary size representing the total information that 


the adversary can get about Alice and Bob’s outcomes 
and z can be an arbitrary measurement that depends on 
the information available to the adversary in the protocol 
before his measurement. 

Here we shall make the following simplifying assump¬ 
tions. First, we will assume that the device behave in 
an i.i.d. way and similarly that the adversary extracts 
his information in an i.i.d. way by performing at each 
run individual measurements 2 ,. We can thus write 
p(abe|xyz) = YYl = iP{.a, l b i e i \x z yiZi). Second, we are go¬ 
ing to assume that Alice and Bob’s marginal p(ab\xy) at 
each run are known and given. In this way, we do not 
need to take care of estimation. With these assumptions, 
finding the adversary’s optimal attack thus amounts at 
optimizing some quantity over all tripartite quantum 
distributions p(abe\xyz) = (\E'|M a | x 8 M b \ y 8 M e \ z \ \I/) 
compatible with a given bipartite marginal p(ab\xy) = 
Y, e pi.a.be\xyz) = ('F|M a | x <8 M b \ y 8 I |T). 

Let us now introduce the additional ingredient of post¬ 
selection. For this, we consider a bipartition of the joint 
output alphabet 0 = dx8 into two sets V (valid sym¬ 
bols) and A f. If the outputs at a given round (a, b) £ V, 
we say that the round is valid, and otherwise, if (a, b) £ 
Af, that it is invalid. We refer to the events obtained in 
valid runs only as the post-selected events. Our goal is 
to estimate how much randomness can be extracted from 
these post-selected events. 

A priori, an adversary trying to guess the post-selected 
events might not have access to the information about 
which run turned out to be valid or invalid, since he 
should not have access to the outputs observed by the 
parties. For simplicity, however, we’ll assume here that 
the adversary has access to this information. This allows 
him to know exactly which run he should try to guess and 
is thus advantageous for him. The amount of randomness 
that can be certified in this case thus constitutes a lower 
bound on the amount that can be certified when the ad¬ 
versary is not given this information. This assumption 
might however be problematic in a non-i.i.d. situation 
(see Section |V|). 

We are going to assume in the following that Alice and 
Bob use a certain pair of inputs ( x,y ) for randomness 
generation |15j . Since there is a promise on the marginal 
p(ab\xy) and since we do not need to consider how to esti¬ 
mate this quantity, we are going to assume for simplicity 
that Alice and Bob always measure their systems using 
the inputs ( x,y ). Suppose that by measuring n systems, 
they obtain m results in V and n — m results in A f. The 
number m of valid results is a random variable with prob¬ 
ability distribution p(m) = (^)p^(l ~ Pxy) n ~ m , where 
Pxy = Sahgv P( a b\xy) is the single-run probability to ob¬ 
tain a pair of valid results when using inputs ( x , y). 

By the i.i.d. assumption, the min-entropy of the m- 
elements post-selected string is mH^y, where H Sy is the 
single-run min-entropy and is defined below. Applying a 
randomness extractor to this string, then yields rn H S y 
bits of randomness (such extractors exist up to e cor¬ 
rection, see El HU). The average length of the final 
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random string is then Ym^oPi 171 ) 171 ^sy = npxyHxy- 
We can also interepret this last quantity as an “average” 
min-entropy T3] . The rate of randomness extraction per 
use of the device can then be defined as pxy H x y. 

To complete the analysis, it remains to determine H X y. 
By definition, the min-entropy is related to the guessing 
probability G x y as H x y = — log 2 G X y, where the guessing 
probability is the maximal probability that the adver¬ 
sary correctly guesses Alice and Bob’s outputs by per¬ 
forming an optimal measurement on his quantum side 
information |l9j . Here since we condition on valid runs, 
this quantum side information can be represented by the 
cq-state pabe = jr: Y ab &v \ ab )(a b \ ® Pe, where p°£ = 
tr (M a \z 8 M;,|g 8 I |^)(^|). The probability that the 
adversary then makes a correct guess e = (a, b ) of Alice 
and Bob’s outputs a, b by performing a measurement z on 
his system is, averaged over Alice and Bob’s possible out¬ 
put®. Eabev tr ( M ab\zp a E b ) = ^ EabeV (*I- M a\x ® 
Muy 8 MfM z \^/). To determine the maximal value of 
this guessing probability, we should maximize it over all 
quantum realizations R = (|\H), {M a i x }, {M b \ y }, {M e i z }) 
compatible with the given marginals p(ab\xy) character¬ 
izing Alice and Bob’s devices. We thus have 

Gxy = —niax^ {®\M a \x ® M b \g 8 M ab \ z \%) (1) 

p *y abev 

s.t. {^\M a \ x 8 M b \y 8 I\W) = P(ab\xy). 

Following J5] and introducing the bipartite subnor¬ 
malized quantum correlations p a ' b ' (■ ab\xy ) = (^|M a | x 8 
A/ & | y 8A/ a / b /|j|'k) where z denotes the adversary’s optimal 
measurement which maximizes 0 » the above optimiza¬ 
tion program can be rewritten as 

Gxy = — max V' p ab (ab\xy) 

P*y iw a ^ v 

s.t. ^2 Pa'b'{ab\xy) = p(ab\xy) (2) 
a’b'GV 

Pa'b 1 (ab\xy) e Q 

where Q denotes the set of unormalized bipartite quan¬ 
tum correlations. The meaning of this program is in¬ 
tuitive: Eve prepares one of |V| systems for Alice and 
Bob, one for each outcome pair ( a'b Each system 
is characterized by joint probabilities p a ' b '(ab\xy) = 
Pa'b 1 (ab\xy) / q a 'b' and is prepared with probability q a i b i = 
YabPa'b'( ab \ x y)- When Eve prepares system ab, she 
guesses that Alice’s and Bob’s outputs are ab, hence 
the probability that she guesses correctly on average is 
given by the objective function in |2]). Eve’s preparations 
should of course on average reproduce the given correla¬ 
tions p(ab\xy), hence the first constraint of ([2]). The sec¬ 
ond constraint simply expresses that Eve’s preparations 
should be compatible with quantum theory. 

Notice that the constraints in the second line of 0 
and in the second one of 0 . involve all outputs a, b and 
not only those belonging to the post-selected set V. This 


reflects the fact that our analysis is not subject to the 
detection loophole. 

To summarize, for a given set of bipartite correlations 
p{ab\xy) characterising the behavior of the devices, the 
figure of merit that we are going to consider in this paper, 
which we call the randomness rate, is PxyH X y = pxy x 
(— log 2 Gxy) where G S y is the output of the optimization 
problem pi. 

In general, it is not possible to carry out explicitly 
this optimization as there is no closed form for the set of 
quantum correlations Q. However, we can upper-bound 
the optimal value of ([2]), and thus lower-bound the ran¬ 
domness rate, through semidefmite programming by re¬ 
laxing the last condition P a / b > (ab\xy) £ Q and asking 
that P a ’ b '{ab\xy) belongs to some level of the NPA hier¬ 
archy instead of the exact quantum set. All op¬ 

timizations reported here were performed at local level 1 
of the SDP hierarchy [23] . 

III. APPROXIMATING PHOTONIC 
EXPERIMENTS 

The natural benchmark to test our tools are the corre¬ 
lations expected in a Bell experiment using spontaneous 
parametric down-conversion (SPDC). In the single-mode 
case, such a pulsed SPDC source produces a state of the 
form 

IV’) = c(g, |o) , (3) 

where a^/v ( b H/v) are polarization modes for Al¬ 
ice (Bob), |0) is the vacuum state, and c(g,g ) = 
\/l — tanh 2 g\J 1 — tanh 2 g for g, g being the two squeez¬ 
ing parameters. The parties Alice and Bob can measure 
this state by placing two detectors after the usual set of 
wave plates and a polarization beam splitter. If the de¬ 
tectors do not resolve the number of incident photons, 
four cases can then be observed: no detection, a click in 
the first detector, a click in the second detector, or two 
clicks. In the following, we label a click in the first de¬ 
tector as 0, a click in the second detector as 1, and the 
case where either no detection or double detections are 
observed as 0, so that each party effectively produces 
one of three possible outcomes. The statistics observed 
in this situation as a function of the polarization mea¬ 
surements and the detection efficiency (or equivalently 
the losses between the source and the detectors) are de¬ 
scribed in [231 . 

Using the program ([2]), we are going to compute lower 
bounds on the extractable randomness that can be found 
in presence of these statistics in the following cases: 

• (a) All outcomes are considered (no post-selection), 
i.e. J\f = A4 = {} (the empty set). 

• (b) The post-selected string of outcomes does not 
contain double occurrences of 0, i.e. J\f = J\f b = 
{00}. 
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• (c) The post-selected string of outcomes does not 
contain any occurrence of a no-detection event 0, 
i.e. J\f = Af c = {00, 10, 00, 00, 01}. 

For the sake of comparison, we will sometimes also con¬ 
sider the case in which the measurements are performed 
only when at least one photon pair is produced by the 
source, i.e. 

• (h) The source is heralded. 

An example of heralded experiment is the recent one of 
Hensen et al. [55]. Note that in this particular case the 
state is encoded in a non-photonic system and always 
yields a detection whenever measured. 


A. Perfect detectors, variable squeezing 

We first consider the case of an experiment with no 
loss, and with unit efficiency detectors. In this case it 
seems natural to try to generate a maximally entangled 
state. We thus set g = g and vary the squeezing g. Vary¬ 
ing g can also be understood as changing the time win¬ 
dow r during which detectors are monitored. Indeed, 
the average number of photon pairs produced within this 
window is given by v = sinh 2 g + sinh 2 g = 2 sinh 2 g. 

Figure [l] shows the randomness per run obtained when 
setting the polarization measurement according to the 
standard CHSH settings. The various discarding strate¬ 
gies yield different amounts of certified randomness, the 
largest amount being obtained using strategy (b). 

One may be tempted to infer that, for randomness ex¬ 
traction, SPDC sources should be operated with detec¬ 
tion window at v ~ 0.6. However, this is the amount 
of randomness per run, not per time. For a given pump 
power, decreasing the window size r decreases the aver¬ 
age number of photon pairs in a proportional manner: 
v oc t. At the same time, the number of time windows 
increases as~l/rocl/i/. If /(V) denotes the random¬ 
ness rate per time window, the randomness that can be 
certified in a given time interval is thus given, up to a 
constant factor, by f(v)/v. This quantity is plotted in 
the inset of Figure[l] where one can see that total amount 
of randomness certified is larger when v is small, i.e. the 
time window r is small. Therefore, in the asymptotic 
limit of infinitely many runs, one should set r —► 0 to 
get more randomness per time against an i.i.d adversary. 
In this case, the observed data set is dominated by dou¬ 
ble no-detection events, which reinforces the relevance of 
our post-selection approach. The regime of small v is 
also the regime in which optical experiments closing the 
detection loophole have been performed mum [53], for 
a different reason: in the presence of losses and imper¬ 
fect detectors, the Bell violation disappears if too many 
pairs are created, while is preserved in the limit of small 
windows [55.. 



FIG. 1. Randomness from an SPDC source when setting the 
polarization measurement according to the standard CHSH 
settings, as a function of the average number of photon pairs 
produced in each detection window. No losses and unit ef¬ 
ficiency detectors are assumed. The qualitative shape of the 
curves can be understood as follows: for small g, the generated 
state contains mostly the vacuum; for large g , the source gen¬ 
erates several pairs, which worsens the statistics [55] . Strate¬ 
gies (a), (b) and (c) certify various amounts of randomness. 
Here and in the following figures, all the curves are normalised 
to the same number of runs, namely the total number of runs. 
Inset: Randomness certified in a given time period when the 
length of a time window varies (and the number of time win¬ 
dows varies accordingly). This curve is obtained at constant 
pumping g. 

B. Imperfect detectors, small squeezing 

For the reasons just mentioned, we focus now on 
g,g « 1. (i.e. small zz). In this case, a large num¬ 
ber of no-detection events is expected. In spite of this, 
we are going to see that strategy (b) continues to per¬ 
form better than the others. Concretely, we choose to 
fix the average number of photon per detection window 
as v = 0.01. The state produced by the source can be 
approximated to first order in g and g by 

oc |0) + (tanh(g)a < H by - tanh(ff)a^6^) |0) . (4) 

In analogy with the partially entangled state cos 9 |01) — 
sin 9 110), we define the entanglement parameter of the 
state as 9 = arctan(tanh g/ tanhg). 

We now introduce finite detection efficiency g and 
study how the certification of randomness varies with this 
parameter. We then consider two families of correlations. 
In the first, the two-photon state is maximally entangled, 
i.e. with 0 = 7r/4, and we fix the standard CHSH po¬ 
larization measurements. The expected randomness per 
run as a function of g is shown in Figure [2] We note 
that no randomness can be extracted if 77 < 82.8% which 
is known to be the boundary at which those correlations 
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FIG. 2. Randomness from a singlet with finite detection 
efficiency. Curves (b) and (h) coincide almost perfectly and 
approach 0 at the detection loophole limit 0.828 m 



0.65 0.7 0.75 0.8 0.85 0.9 0.95 

Detection efficiency 

FIG. 3. Randomness from Eberhard correlations. Curves 
(b) and (h) coincide and approach 0 at the Eberhard limit of 
2/3 QU. Two recent experiments used this Eberhard corre¬ 
lations. In Ref. [28], the overall efficiencies are estimated at 
78.6% for Alice and 76.2% for Bob; in Ref. [27], at 74.7% for 
Alice and 75.6% for Bob. Thus, strategies (a) and (b) would 
extract a very similar (small) amount of randomness. If ef¬ 
ficiencies are increased in the future, strategy (b) should be 
preferred. 


can be explained with a local model exploiting the de¬ 
tection loophole. The second case is that of Eberhard’s 
famous study | 12 j . in which the entanglement parameter 
9 depends on the detector efficiency 77 , and Alice’s mea¬ 
surements are parametrized by two angles ao,ai which 
also depend on 77 . These parameters are chosen to opti¬ 
mize the violation of a lifting ;30j of the CHSH inequality, 
in the case where exactly one pair of photons is measured, 
for each value of 77 . The resulting randomness rate is plot¬ 


ted in Figure [3] Again, no randomness can be extracted 
below the known detection loophole threshold 77 < 66 . 6 %. 

In both cases we notice again that, within a numer¬ 
ical precision ~ 10 -5 , strategy (b) certifies the largest 
amount of randomness and in fact recovers the result that 
one would obtained with a heralded source (h). The ex¬ 
pected proportion of discarded events is ~ (1 — 7 /)+ 7,(1 — 
77 ) 2 , which can be substantial: it is larger than 99% in our 
case for all 77 . Strategy (c), i.e. removing all events where 
some no-detection occurred, results in clearly lower ran¬ 
domness per run; and for efficiencies lower than 86 % and 
85%, no randomness at all is even certified. This kind of 
post-selection is thus too strong if one is interested in cer¬ 
tifying an optimal amount of randomness. Strategy (a) 
certifies essentially the maximum amount of randomness 
for efficiencies 77 < 90%, but would become suboptimal 
as efficiency increases. 


IV. UNDERSTANDING WHY ONE CERTIFIES 
MORE RANDOMNESS FROM A SUBSET OF 
DATA 


Let us stress again that in Figures [l][3j all the curves 
are normalised to the same number of runs, the total 
one. Thus, they show that if a suitable small fraction 
of the symbols is processed, a strictly larger amount of 
total randomness can be certified, as compared to the case 
where all the symbols are processed. In order to shed 
light on this behavior, we consider a simplified model 
in which the source emits a perfect maximally-entangled 
state with probability 7,, and the vacuum otherwise (in 
other words, compared to the previous section, we neglect 
completely the possibility of double detections in each 
party’s measurement setup). We also work at perfect 
detection efficiency 77 = 1. The statistics observed with 
such a source can be written as 


p{ab\xy) = 


= \v\(l + {-l) a+h+xv ^) if o, b G { 0 , 1 }, 


(1-7,) 


if a = b = 0 . 


(5) 

Notice that, for the source efficiency v = |, these correla¬ 
tions can be seen as the scrambled version of the two-day 
extreme situation mentioned in the introduction. 

In Figure [4j we show how much randomness can be 
certified for these statistics when v varies. In this case, 
the lower bound on the randomness computed from the 
raw data is this time consistantly lower than the one 
obtained after removing double no-detections from the 
data. In fact, after discarding double no-detections, the 
same amount of randomness that could be certified if the 
source was heralded is recovered (i.e. it is proportional 
to the source efficiency v). 

We thus recover the same behaviour as discussed in 
Section III B and in the two-day example of the introduc¬ 
tion. If we don’t consider it an overwhelmingly improb¬ 
able fluctuation, the two-day example clearly suggests a 
non-i.i.d. process, for which the possibility of identifying 
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FIG. 4. Randomness from a singlet produced with finite 
probability v , with r] = 1. Curves (b) and (c) are identi¬ 
cal, since there are no events with one detection and one 
no-detection in the raw data (the post-selection procedures 
(b) and (c) are actually the same for this correlation). Curve 
(h), which gives the randomness from raw string of outcomes 
upon the heralding of a successful preparation of the state 
(i.e. randomness from the correlation [5j, exactly coincides 
with curves (b) and (c). Curve (a) lies below the other ones. 


two separate processes is easy to understand. Here, on 
the contrary, the statistics are manifestly i.i.d. — and 
nevertheless, the extraction of randomness based on the 
single-run frequencies p(ab\xy) can be improved. We are 
going to show that the cause is the same: because of the 
structure of the correlations, one can actually identify the 
presence of two distinct processes, and the post-selection 
of detection events happens to capture this fact. That 
the alternation between the two processes is done in an 
i.i.d. way, instead of a disruptive way as in the two-day 
example, eventually does not matter. 


Note first that by definition p(ab\xy) has the block 
structure p(ab\xy) = v q(ab\xy) + (1 — v) r{ab\xy ) where 


q(ab\xy) 
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0 



1 
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1 
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0 

0 


( 6 ) 
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y 


0 
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0 

1 

0 

0 

1 
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0 

0 

0 

0 

0 

0 

0 

0 

1 

0 

0 

0 

0 

0 

0 


0 
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0 

0 

0 

0 

0 

0 

1 

1 

0 

0 

0 

0 

0 

0 


0 

0 

0 

1 

0 

0 

1 


It follows that in the decomposition p{ab\xy) = 

g i'fc'ev Pa'b' ( a b\ x y) m the second line of the program 
, every p a 'b' (ab\xy) must also have this block struc¬ 
ture, since if p(ab\xy) is equal to zero for some a,b,x,y 
then p a 'b'(nb\xy) must also necessarily be equal to zero. 
We can thus write p a 'b’ (ab\xy) = v a 'b' qa'b' (nb\xy) + (1 — 
Va'b') r(ab\xy) where q a 'b'(ab\xy) is normalized and has 
the same general form as q(ab\xy) above. The condi¬ 
tion p(ab\xy) = Sa'b'ev Pa'b' (ab\xy) is then equivalent to 
Ea'&'ev v a'b> = v and J2a'b'ev^'b'(ab\xy) = vq(ab\xy). 

Furthermore, when we post-select events according to 
(b) or (c), the effective set of valid symbols is in both 
cases V = { 00 , 01 , 10 , 11 } since outcome pairs 00 , 10 , 
00, 01 have zero probability. The objective value in 
([ 2 ]) therefore only involves the q a 'b'{o,b\xy) part and is 
equal to \/v maxE a &ev Vab q.ab{ob\xy), where we used 
that p S y = Eabev v ab = v. 

All together, we can thus rewrite the optimization ([ 2 ]) 
as 

Gxy = - max ^2 Vabq a b(ab\xy ) 
v abGV 

s.t. ^ v a 'b' qa'b' {ab\xy) = vq{ab\xy) (8) 

a'b'GV 

q a 'b'{ab\xy ) G Q 

where Q denotes the set of normalized quantum correla¬ 
tions. Defining q a 'b'(ab\xy) = v^b'/v x q a b(ab\xy), we 
can further rewrite it as 

G S y = max ^2 Qab(ab\xy) 

abev 

s.t. ^2 q a 'b'{ab\xy) = q{ab\xy) (9) 

a'b'GV 

q a 'b' (ab\xy) £ Q 

This optimization is nothing but the one associated 
to a heralded source characterized by the correlations 
q(ab\xy ) and explains why curve (h) of Figure [4] coin¬ 
cides with curves (b) and (c). 


V. GOING BEYOND I.I.D. FOR THE SOURCE 

In this section, we are going to relaxing the i.i.d. as¬ 
sumption for the source. We won’t be able to derive 




















7 


bounds for the extraction of randomness from the most 
general non-i.i.d. source. But we are going to provide 
two example of non-i.i.d. strategies that are strictly more 
powerful than i.i.d. strategies even in the asymptotic 
limit of infinitely many runs. To our knowledge, this is a 
feature not found in previous works on randomness from 
Bell tests 01311132 ] or on quantum key distribution [55] . 
In the strategies we found, the adversary exploits the 
knowledge of whether each outcome is kept or discarded. 
As mentioned in Section [TlJ it would be definitely rea¬ 
sonable not to reveal anything, but such scenario may 
introduce other security concerns (e.g. the raw key is 
private conditional on some other information being kept 
private). 

Specifically, suppose that the outcomes of run k are 
valid, i.e. they are kept for the raw key; the adversary 
would like to know their value. In a non-i.i.d. case, the 
fact of keeping or discarding the outcome at run k + 1, 
an information which we assume the adversary will learn, 
may leak some information about the outcome that is 
kept at run k. This is similar to the argument of [51] 
against reusing QKD devices in the device-independent 
level of characterization [52]. Notice that this behaviour 
does not require the adversary to have tampered with the 
device in a malicious way, it may be simply a defect of 
fabrication that the adversary is aware of. For instance, 
suppose that the detector corresponding to outcome 0 
has an inordinately long jitter time compared to the other 
detector: if a detection happens at run k + 1 , it means 
that the outcome at run k was 1; if no detection, the 
outcome at run k was most probably 0. 


remains in the non-discarded outcomes (see Figure [4]). 

In all existing protocols, the amount of randomness 
that is extracted is determined from a statistical test 
which is based on the input and output pair counts 
y) and #(a, b ) (or simply relative outcome frequen¬ 
cies #(a,b)/#(x,y). However, the same statistics ob¬ 
tained for v = 2/5 can be obtained with high probability 
when measurements are always performed on a perfect 
singlet, but runs with double no-detections are artificially 
added by using the following non-i.i.d. rule: 


singlet outcomes (a, b) 

following runs 

(0,0) 

M 

(0,1) 

(0,0) M 

(1,0) 

(0,0) (0,0) M 

(1,1) 

(0,0) (0,0) (0,0) M 


( 10 ) 

where M means that an usual measurement is performed 
on the perfect singlet to determine the outcome of that 
run. In this case, counting the number of successive dis¬ 
carded events fully informs about the value of both par¬ 
ties’ outcomes. Thus, in the non-i.i.d case, and allowing 
signalling from one box to the other between measure¬ 
ment runs, no private randomness can be certified from 
a non heralded source characterized by v < 2/5 (un¬ 
less some more complicated processing beyond looking 
at simple outcome counts is done). 


B. Second example 


A. First example 

The simplest example we found requires both Alice’s 
and Bob’s devices to depend on the previous inputs and 
outputs of both sides. Note that this is not in contradic¬ 
tion with the basic assumption in all device-independent 
protocols that the two boxes are non-communicating, 
since this assumption must only be verified during the 
measurement runs. Between measurement runs, how¬ 
ever, boxes could in principle be free to communicate. 
For instance, before the measurement runs, the boxes 
may open a door within a small time interval to let enter 
incoming quantum systems, those generated by and com¬ 
ing from the source. Malicious boxes could take advan¬ 
tage of this interval to exchange the inputs and outputs 
obtained in previous runs. In the next subsection, we will 
present a more convoluted example that does not require 
signalling between the boxes, and thus which also works 
if measure are taken to insure that the boxes do not ex¬ 
change such kind of information between measurement 
runs. 

Consider the i.i.d. correlations obtained when the par¬ 
ties measure a singlet with probability v, and nothing 
with probability 1 — v. We have encountered this situ¬ 
ation in paragraph |IV| for any v > 0, some randomness 


The second example was found numerically. It is ad¬ 
mittedly hard to find a narrative justification for it, be¬ 
sides the general intuition given above; but we describe 
it in detail since, to our knowledge, it is the first example 
in which a non-i.i.d. strategy actually outperforms the 
i.i.d. ones in a Bell scenario in the asymptotic limit. 

Resources. In each run, Alice and Bob share two bi¬ 
nary variables A, /i € {0,1} and one out of five quantum 
correlations that we denote by Pj with j £ {1,2,3} and 
P{. These correlations are such that Alice’s box has three 
outcomes {0,1, 0}, while Bob’s box has only the two out¬ 
comes {0,1}: in other words, information about previous 
outcomes will be leaked out by Alice’s box detection or 
no-detection events. We can write these correlations as 
above in the form of Collins-Gisin tables [55]: 


y 

x a\b 

0 

0 1 

1 

0 1 

0 

0 1 

0 



0 

1 1 

0 




1 

Pa(0\0) 

^4(0|1) 

Pb(0|0) 

-Pb(1|0) 

P(00|00) 

P(10|00) 

P(00|01) 
P(10 01) 

Pb( 0|1) 
Pb( 1|1) 

P(00|10) 

P(10|10) 

P(00|11) 

P(10|ll) 


















because by no-signaling it holds P(al\xy) = Pa{o\x) — 
P(aO\xy) and P(0b\xy) = PB(b\y) — P(Ob\xy) — P(lb\xy); 
and of course J2 a P{ a \ x ) = Lb P(b\y) — 1- The example 
that we find uses: 


1 

0.4453 

0.3121 

0.6570 
Pi = 0 

0.1708 

0 

0.0394 

0 , (11) 

0.3244 

0.4942 

1 

0.0247 

0.4195 

0.8544 

0.2843 

0.0277 

0.7373 

0 

P 2 = 0.8919 

0 

0.8381 

0 

0.7209 , (12) 

0.2619 

0.4973 

1 

0.1165 

0.4972 

0.6042 

0.2617 

0.2354 

0.5429 

0.3979 
P 3 = 0.6021 

0.0886 

0.5156 

0.0365 

0.5064 (13) 

0.4588 

0.5412 

1 

0.1078 

0.4964 

0.6663 

0.4267 

0.1162 

0.2038 

1 

PL o = o 

0.6663 

0 

0.2038 

0 , (14) 

0.2936 

0.7064 

1 

0.1393 

0.5270 

0.9996 

0.1112 

0.0926 

0.0015 

0 

PL i = i 

0 

0.9996 

0 

0.0015 • (15) 

0.0010 

0.9990 

0.0006 

0.9990 

0.0004 

0.0011 


Protocol. One starts with one of the three Pj’s. As 
long as j = 1 or j = 2, the next round will also use one 
of the three Pj ’s. When P 3 was chosen, the next box will 
be P' x with the value of A available in that run. Besides, 
if Alice’s outcome from P 3 was a = /z, in the next run 
Alice uses the box P' x : if the outcome was a = 1 — /x, in 
the next run Alice ignores P' x and outputs 0. After this, 
the process starts again by selecting one of the three Pj’s. 


Now, when x = 0, either outcome 0 or outcome 1 can¬ 
not occur for each potential correlation except P 3 ; and 
when P 3 is used, its outcomes is fully leaked out in the 
next run by the information of whether the subsequent 
outcome is kept or not, since P' x (0\x) = 0. 


One can check, however, that it would not be pos¬ 
sible to fully guess Alice’s outcome if the same out¬ 
come relative frequencies as the one generated by the 
above process where produced by devices behaving in an 
i.i.d manner. For instance, let us specify q\ = 0.4097, 
q -2 = 0.4992, q 3 = 0.0911 as the frequencies at which the 
Pj’s are chosen; and p (A = 0) = 1 — p {A = 1) = 0.0013, 
p(p = 0) = p(p = 1) = 1/2. The expected relative fre¬ 
quencies in the asymptotic limit are then peaked around 
the following values 


giPl + g 2 P 2 + q 3 P 3 + g 3 (p( A = o )(P/ + P/ g )/2 +p(X = 1 )(P{ + P[ B )/ 2) 

qi + q- 2 + 2 q 3 


1 

0.6919 

0.5000 

0.2800 

0.0716 

0.0178 

0.5000 

0.4681 

0.3722 

0.2800 

0.0716 

0.2621 

0.5000 

0.4681 

0.1279 


(16) 


where P' X B denote the correlations obtained when Bob 
uses P x and Alice outputs 0. Applying our i.i.d. pro¬ 
gramme to these correlations, one can show that in case 
Alice uses x = 0 and the run is not discarded, the 
guessing probability on her outcome is upper-bounded 
by 0.9874. 


VI. CONCLUSION 


This work stems from the general remark that ran¬ 
domness extraction does not need to be performed on 
all of the raw data and can be done by blocks, or on a 
subset of data. In the context of randomness certifica¬ 
tion by Bell inequalities, we have investigated in a simple 
scenario whether this could provide an advantage when 
post-selecting detection events, which is relevant for pho- 
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tonics Bell tests. Because we estimate the randomness 
present in a subset of data conditioned on the knowledge 
of the whole set of data, this certification does not open 
the detection loophole. 

Naively, one could a priori think that “full detection” 
events, where a detection happens on both side, are the 
most important for randomness certification and that dis¬ 
carding all other events would influence only negligibly 
the randomness rate. However, our findings show for sev¬ 
eral physically-motivated models of the observed statis¬ 
tics that this is not the case. In particular, Figure [2] and 
Figure [3] show that the resistance to detection inefficien¬ 
cies is substantially lower (up to 20% for the scenario 
Figure [3]) when the post-selected data does not contain 
any occurrence of a no-detection event. 

The physical intuition that the double no-detection 
events contain almost no randomness is, however, vin¬ 
dicated. In some cases, the post-selection actually help 
identify a better way of reading the data. From a prac¬ 
tical perspective, our work suggests the possibility of 
hashing a small post-selected subset of the original data, 
thereby reducing the needed seed length, and ultimately 
the computational time. However, one should still embed 
this idea within a full randomness certification protocol, 
in particular one that can deal with finite statistics and 
non-i.i.d. devices. 

Regarding this last point, the physical intuition that 
double no-detection events can safely be discarded, as 
vindicated by our numerical results in an i.i.d. setting, 
should, however, be contrasted with the example of Sec- 
tion[V]in which we prove that non-i.i.d. strategies outper¬ 
form i.i.d. ones even in the asymptotic limit of infinitely 


many runs, something that had not been reported previ¬ 
ously in the context of Bell inequalities. Whether these 
strategies are actually harmful in a more general and re¬ 
alistic case remains to be determined. 

In particular, we remind that for simplicity we have 
performed our analysis assuming that the adversary gets 
to know which runs are kept and which ones are discarded 
in the post-selection. This scenario is rather artificial 
for randomness generation, insofar as the two boxes for 
the Bell experiment don’t need to be in separate labs. 
Relaxing this assumption could increase the randomness 
rate and the security of the final string. Specifically, the 
non-i.i.d. attacks of Section [V] would not apply anymore 
in this case. 
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